Across-word phoneme models for large vocabulary continuous speech recognition

نویسنده

  • Achim Sixtus
چکیده

In this work, the application of across-word phoneme models during large vocabulary continuous speech recognition is studied. A recognition system will be developed which allows for the training of high performance across-word phoneme models, the efficient application of these across-word phoneme models in combination with long-span language models in one single search pass, and the construction of word graphs. In contrast to within-word phoneme models which consider the context dependency of the phonemes representing the words in the vocabulary only within the words and use a reduced phonetic context at word boundaries, across-word phoneme models consider the context dependency of the phonemes also across word boundaries. As it is known for many years, this results in significant word error rate improvements but also in a considerably higher computational effort. Today, across-word phoneme models are applied by a number of groups. However, the published descriptions of these recognition systems are often quite general, many implementation details needed for the successful application of across-word phoneme models are usually missing. In this work, all details about the transformation of a baseline within-word model system into an across-word model system will be discussed. It will be analyzed in detail how the introduction of across-word phoneme models affects word error rate, runtime and memory requirements of the recognition system. First, the across-word model paradigm will be integrated into the very general Bayes’ decision rule which is the basis of speech recognition. Taking all model assumptions and approximations needed for the application of across-word models into account, a specialized decision rule will be derived. Based on this specialized decision rule the across-word model system will be developed. Compared to the baseline within-word model system, the introduction of across-word phoneme models results in a significantly more complex search network. The efficient application of across-word phoneme models in combination with long-span language models in one single search pass requires a careful design of the search network as well as of the search algorithm which will be discussed in detail. The across-word model search developed in this work will be compared to the baseline within-word model search with regard to active search space, computational effort, and memory requirements. In contrast to the baseline within-word model training, the phonetic representation of the training utterances is not unique anymore if across-word models are to be trained. Two different training procedures which take care of this ambiguity, will be presented. The procedures will be compared with regard to the recognition performance of the resulting across-word phoneme models and their computational effort. In addition, it will be discussed how the parameterization of the baseline within-word model training should be modified to obtain optimally performing across-word models. The introduction of across-word models affects also the construction of word graphs. Starting from baseline word graphs constructed during a within-word model search, a construction method for word graphs during across-word model search will be developed. In order to optimize the runtime of the developed across-word model search further, several acceleration methods will be applied which have partly already been discussed for within-word model systems in the literature. In addition, methods for further increasing the accuracy of across-word models will be studied which are based on a refined pronunciation modeling. The developed across-word system will be finally evaluated on three different speech corpora by comparing the recognition results of this system to the recognition results of the baseline within-word model system. On two of the corpora, these results will also be compared to the results of other research groups, as they are published in the literature. It will be seen that the developed recognition system produces state-of-the-art word error rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Input Acoustic Analysis Phoneme Inventory Pronunciation Lexicon

This paper gives an overview of an architecture and search organization for large vocabulary, continuous speech recognition (LVCSR at RWTH). In the rst part of the paper, we describe the principle and architecture of a LVCSR system. In particular, the issues of modeling and search for phoneme based recognition are discussed. In the second part, we review the word conditioned lexical tree search...

متن کامل

Speech Input Acoustic Analysis Phoneme Inventory Pronunciation Lexicon Language Model

This paper gives an overview of an architecture and search organization for large vocabulary, continuous speech recognition (LVCSR at RWTH). In the rst part of the paper, we describe the principle and architecture of a LVCSR system. In particular, the issues of modeling and search for phoneme based recognition are discussed. In the second part, we review the word conditioned lexical tree search...

متن کامل

Prosody Dependent Speech Recognition on Radio News

Does prosody help word recognition? Humans listening to natural prosody, as opposed to monotone or foreign prosody, are able to understand the content with lower cognitive load and higher accuracy [1]. For automatic Large Vocabulary Continuous Speech Recognition (LVCSR), the answer is not that straightforward. Even though successful word recognition and successful prosody recognition have been ...

متن کامل

A Rejection Method for the Isolated Word Recognition System

M efficient rejection method is implemented for the HMM based small vocabulary isolated word recognition system. Six clustered phoneme models are generated using statistical method from the 45 context independent Korean phoneme models which were trained using the phonetically balanced Korean speech database and the classification through likelihood ratio scoring is performed based on the cluste...

متن کامل

Development of large vocabulary continuous speech recognition system for Mongolian language

We developed a large vocabulary continuous speech recognition system(LVCSR) for Mongolian language. It is the first LVCSR system of Khalkha dialect in Mongolia. Firstly, we created Mongolian speech corpus for acoustic model and it contains over 6000 utterances in total recorded from 700 different sentences spoken by 40 male speakers, and then we created monophone and triphone based HMMs. Second...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003